Makes document attachment usage consider external attachments #1600

Spoffy · 2025-05-05T22:19:22Z

Context

Documents' attachment usage currently only considers files which are stored internally.

If a document has external attachments, they won't show in the document's "Size of attachments" bar in "Raw Data".

Proposed solution

This updates the attachment usage calculations to use _grist_Attachments.fileSize for external attachments, while preserving the existing logic for internal attachments.

Missing attachments

If a document has external attachments and is uploaded to a Grist instance, these missing attachments will still count towards the document's attachment usage. There's no easy way to mitigate this in all situations without checking all attachments on document open.

Untrusted file size value

This creates a problem with imported documents, which may have an incorrect fileSize value if it has been manually altered (i.e - the value is untrusted).

To mitigate this, the fileSize value is now updated when attachments are uploaded as a .tar archive. This means that eventually, fileSize is always valid for files which are present in external storage.

Related issues

#1021

Has this been tested?

👍 yes, I added tests to the test suite

Spoffy · 2025-05-05T22:20:38Z

This is dependent on the UI PR being merged.

This needs to be rebased once that is merged.

paulfitz

I was wondering if we need some tests on just adding and removing attachments and checking that stats update correctly, when those attachments are external. I see some attachment accounting tests in test/nbrowser/DocumentUsage.ts in grist-saas. If those aren't entangled too much with billing would it be worth bringing them across? Or are there already tests?

paulfitz · 2025-05-08T13:23:06Z

app/server/lib/ActiveDoc.ts

+    const attachments = this.docData?.getMetaTable("_grist_Attachments").getRecords();
+    for (const attachmentRec of attachments ?? []) {
+      const newSize = newFileSizesByFileIdent.get(attachmentRec.fileIdent);
+      if (newSize) {


How about checking if the size is different from the old recorded size? I believe the data engine will prune changes where a value is written to a cell that already contains that value, but seems easy to cut it out at the source here.

berhalak · 2025-05-12T07:31:11Z

test/nbrowser/DocumentUsage.ts

+import {server} from 'test/nbrowser/testServer';
+import {setupTestSuite} from 'test/nbrowser/testUtils';
+
+describe('DocumentUsage', function() {


Can you move this test over using DIFF? I see that you've changed the content of this file, and I don't have tools to show me the what was changed.

I can tag which bits are the same? I've done a restructuring on this file to make testing with external attachments easier, so I'm not sure a diff would show many useful things.

I just did a rebase that should make this easier.

This commit has all the DocumentUsage changes in:
c90b777

New commit hash, had to do a rebase to sort a merge issue:
aacb5ee

Can you maybe add another file for testing only external attachments and leave this one untouched?

Most of these tests fail if untouched 😅

I can isolate and duplicate the attachments tests if you want, and ensure they pass under both external and internal.

But I'm also not seeing the harm in having a pure-core version of this file? Happy to discuss further.

After discussion:

Several of the non-attachments related tests will be moved to a new PR

This file will be renamed

berhalak · 2025-05-12T07:31:53Z

test/nbrowser/DocumentUsage.ts

+  async function makeSessionAndLogin() {
+    // login() needs an options object passing to bypass an optimization that causes .login()
+    // to think we're already logged in when we're not after using `server.restart()`.
+    // Without this we end up with bad credentials.


I think you end up with old credentials, not bad. You can be logged into grist using multiple accounts at the same time, this is a feature :).

I've done a slight rephrasing.

The original session ends up with old credentials, you're right! But any new session ends up with bad credentials - specifically the bearer token ends up as api_key_for_chimpy, which just gives us the wrong responses from the API 😅

Spoffy · 2025-05-12T14:26:59Z

test/nbrowser/AttachmentsTransfer.ts

Hide whitespace changes for this diff - it'll make it much smaller. A lot of it was just indentation changes.

# Conflicts: # test/nbrowser/DocumentUsage.ts

# Conflicts: # test/nbrowser/AttachmentsTransfer.ts

berhalak · 2025-05-20T13:34:33Z

test/nbrowser/gristUtils.ts

@@ -3618,6 +3620,59 @@ export function withEnvironmentSnapshot(vars: Record<string, any>) {
  });
 }

+export function enableExternalAttachments(transferDelay?: string, preserveEnvVars = true) {


Haven't found usage of the second argument, and this whole function looks very similar to the one above.

What do you think about:
Changing line 3611 to:
process.env[key] = typeof vars[key] === 'function' ? await vars[key]() : vars[key];

Changing this function to

export function enableExternalAttachments(transferDelay?: string) { withEnvironmentSnapshot({ GRIST_EXTERNAL_ATTACHMENTS_MODE: 'test', GRIST_TEST_TRANSFER_DELAY: transferDelay ?? "", GRIST_TEST_ATTACHMENTS_DIR: async () => mkdtemp(path.join(await createTmpDir(), 'attachments')) }); }

And just reading process.env.GRIST_TEST_ATTACHMENTS_DIR whenever we need this information in tests?

The main difference between this function and withEnvironmentSnapshot is it only touches specific variables, rather than the whole environment.

The reason is there's a lower chance of accidental issues / conflicts this way, such as when multiple environment snapshots are in use!

That's also why it has that second parameter - to disable the environment restore if it's causing problems.

Let me see if I can deduplicate this a bit though, if that's the main issue here.

I've integrated all the behaviour I wanted into EnvironmentSnapshot and withEnvironmentSnapshot.

You can now pass them specific variables you want to save/set, and it'll save / restore only those variables, minimising the chance of conflicting changes.

The key reason I want this here specifically is because with gu.enableExternalAttachments, it's not apparent that the environment is being modified - increasing the risk of an accidental problem.

Ok, so sorry for the comment above, I'm lost here a little bit with the logic you created. For me this was a simple stack, there was only one env at a time, it was a snapshot not a patch, there are no multiple environment in use (I don't know even what that means for you).

I don't get it, why do you need to touch only specific variables, and how do you create multiple process at the same time? I tested my proposition before, and it worked fine.

Can you revert those changes to the one you had before? Now it is very hard to understand what is going on here.

As discussed, I've moved the helper out to its own file. :)

berhalak · 2025-05-20T13:49:48Z

app/server/lib/ActiveDoc.ts

+      }
+    }
+
+    const action: BulkUpdateRecord = ['BulkUpdateRecord', '_grist_Attachments', rowIdsToUpdate, {


Is it worth checking if rowIdsToUpdate is empty and returning earlier (without all external calls)?

berhalak · 2025-05-20T13:50:08Z

app/server/lib/ActiveDoc.ts

+    await this._applyUserActionsWithExtendedOptions(
+      docSession,
+      [action],
+      { attachment: true },


Suggested change

{ attachment: true },

{attachment: true},

Saas linter will mark this. We have a rule: objects without spaces, scopes with spaces.

Thanks for the warning, might be worth bringing that rule over to core!

Please do, I already tried and failed, but don't remember the specifics.

berhalak · 2025-05-20T15:12:23Z

test/nbrowser/AttachmentsTransfer.ts

 import path from 'path';
 import * as gu from 'test/nbrowser/gristUtils';
 import {fileDialogUpload, TestUser} from 'test/nbrowser/gristUtils';
-import {server, setupTestSuite} from 'test/nbrowser/testUtils';
-import * as testUtils from 'test/server/testUtils';
+import {setupTestSuite} from 'test/nbrowser/testUtils';


This is just a refactor, am I right? I dont see any actual code change.

Yeah - the uses for server and testUtils were replaced by a call to gu.enableExternalAttachments

berhalak · 2025-05-20T15:14:22Z

test/nbrowser/DocUsageTracking.ts

+
+  it('shows usage stats on the raw data page', async function() {
+    await session.tempNewDoc(cleanup, "EmptyUsageDoc");
+    await testDocUsageStatsAreZero();


I think it is better to inline this method. It is not used anywhere else.

Keep this one as is - it's not used here right now, because I moved the extra tests to another branch.

When I open a PR for that, this will be used in multiple places again :)

Hmmm, we usually avoid doing this that way. If it is not used right now in multiple plays, lets introduced this abstraction later on.

Alright, will pull this out.

berhalak · 2025-05-21T11:45:16Z

app/server/lib/ActiveDoc.ts

+      fileSize: newFileSizesForRows
+    }];
+
+    await this._applyUserActionsWithExtendedOptions(


What do you think about wrapping it with try catch ? Asking because, this will throw if ids are not valid, it is very unlikely but still possible if the docData we have in memory is not exactly the same as the one in data engine, everything here is async, so maybe it can somehow interfere with other modifications.

Ideally this would all be done in "transaction like" scope, but I don't know if it is important to worry about that here. We have some mechanism for locking the doc modifications, but maybe this is overkill here. But there is other mechanism available - the upsert action BulkAddOrUpdateRecord, where you can update those values based on the fileIdentity, not the ids.

This is a difficult question. The options seem to be:

Don't catch, fail to import the attachments

Catch, import the attachments with an incorrect file size

If this is very rare, I think I'd prefer to let it fail and force the user to re-import, than import attachments with an invalid file size.

I don't think upsert is an option unfortunately, as it's not correct to add a new record if it doesn't exist. Best case, we have an extra attachment being tracked that doesn't exist. Worst case, it breaks other code because it's not a fully valid record that's added (unless we try to reinsert the whole record...).

My inclination is to leave it, add a comment, see if it actually comes up in production.

How do other parts of the codebase handle the in-memory data and data engine being out of sync? Just locking?

The minimum was to wrap this in try catch, and rethrow something more meaningful, explaining what has happen.

I don't know for sure, but we have some data corruptions problems we don't know how to fix or don't know the reason. So here, we know that the ids can go wrong, so we can anticipate it and explain in more details.
Since we don't do any performance or real load tests, waiting for this to happen on prod is a wrong idea, we don't know anything about how this would work on prod.

I just checked the docstring for the BulkAddOrUpdateRecord, there is a on option parameter that disables adding if it can't find records to update, so I think it actually does look as a good option, wdyt? Here is a docstring:

`options` is a dictionary with optional settings to choose other behaviours: - Set "on_many" to "all" or "none" to change which records are updated when several match. - Set "update" or "add" to False to disable updating or adding records respectively, i.e. if you only want to add records that don't already exist or if you only want to update records that do already exist. - Set "allow_empty_require" to True to allow `require` to be an empty dictionary, which would mean that every record in the table is matched. Otherwise this will raise an error to prevent mistakes like updating an entire column.

I tried this, but apparently that method doesn't support metatables ☹️

Instead, I've wrapped it in a more specific error, and added some logging around that error too. How does that look?

berhalak · 2025-05-21T11:46:50Z

test/nbrowser/DocUsageTracking.ts

+  gu.enableExternalAttachments();
+
+  async function makeSessionAndLogin() {
+    // login() needs an options object passing to bypass an optimization that causes .login()


I think this is not needed now, right? I think we fixed this issue.

The problem was avoided in a different way, by not restarting the server except at the start of the test file.

I can remove this as a result, but at the same time, it would be nice to document this behaviour somewhere. But not sure where.

But I meant, that we actually fixed this issue in other PR, but I can't find it now.

My memory was that this was intended behaviour and we weren't changing it, but my memory might be wrong! :)

test/nbrowser/DocUsageTracking.ts

berhalak

Are those tests failures related to this PR?

This reverts commit aa16df7.

berhalak · 2025-06-03T13:36:38Z

test/nbrowser/DocUsageTracking.ts

+  enableExternalAttachmentsForTestSuite
+} from 'test/nbrowser/externalAttachmentsHelpers';
+
+describe('DocumentUsage', function() {


Suggested change

describe('DocumentUsage', function() {

describe('DocUsageTracking', function() {

The convention is to have the same name as the file.

berhalak · 2025-06-03T13:41:01Z

test/nbrowser/DocUsageTracking.ts

+      await api.applyUserActions(docId, [['AddEmptyTable', "AttachmentsTable"]]);
+      await gu.getPageItem('AttachmentsTable').click();
+      await gu.waitForServer();
+      await addAttachmentColumn('Attachments');


Suggested change

await api.applyUserActions(docId, [['AddEmptyTable', "AttachmentsTable"]]);

await gu.getPageItem('AttachmentsTable').click();

await gu.waitForServer();

await addAttachmentColumn('Attachments');

await gu.sendActions([['AddEmptyTable', "AttachmentsTable"]]);

await gu.getPageItem('AttachmentsTable').click();

await gu.addColumn('Attachments', 'Attachment');

When you change the document through the API, you are not waiting for those change to propagate back to the browser. It works 99% of time, but introduce flakines.

The second waitForServer is not needed I think, is it?

Your helper for adding attachments column is doing the same thing that this call:
await gu.addColumn('Attachments', 'Attachment');

Thanks for the advice, will put in these changes.
Not entirely sure if it's needed or not. I'll double check.

Made these changes, you're 100% right, that second waitForServer isn't needed - gu.sendActions waits :)

berhalak · 2025-06-03T13:49:24Z

test/nbrowser/DocUsageTracking.ts

+  await driver.find('.test-right-tab-field').click();
+  await gu.addColumn(columnName);
+  await gu.setType(/Attachment/);
+}


This helper is not needed, there is a type parameter in the addColumn, which does the same thing as you.

Great to know, removed it :)

app/server/lib/ActiveDoc.ts

Spoffy added Attachments bug Something isn't working and removed bug Something isn't working labels May 5, 2025

Spoffy changed the base branch from main to spoffy/archive-ui May 5, 2025 22:20

Spoffy marked this pull request as ready for review May 5, 2025 23:28

berhalak self-assigned this May 6, 2025

Spoffy requested a review from paulfitz May 6, 2025 15:30

paulfitz reviewed May 8, 2025

View reviewed changes

berhalak self-requested a review May 9, 2025 08:39

berhalak requested changes May 12, 2025

View reviewed changes

Spoffy force-pushed the spoffy/attachment-size-calc branch from e0a0f53 to 871b9e1 Compare May 12, 2025 14:19

Spoffy commented May 12, 2025

View reviewed changes

Spoffy force-pushed the spoffy/attachment-size-calc branch from 6e4d36f to b2c0483 Compare May 12, 2025 14:33

Base automatically changed from spoffy/archive-ui to main May 12, 2025 14:42

Spoffy added 9 commits May 12, 2025 16:49

Updates attachment size calculation to consider external

3938de5

Updates attachment size on restored

9b5638b

Adds tests for attachment file sizes being restored

49e5890

Extracts file size updating into method

b5c102d

Removes archive determinism change

09692d4

Makes attachment transfer only update file size if changed

922a781

Adds DocumentUsage.ts

85e1469

Adds document usage tests

aacb5ee

# Conflicts: # test/nbrowser/DocumentUsage.ts

Adds helpers for testing external attachments

bc5ae34

# Conflicts: # test/nbrowser/AttachmentsTransfer.ts

Spoffy force-pushed the spoffy/attachment-size-calc branch from b2c0483 to bc5ae34 Compare May 12, 2025 15:51

Spoffy added 2 commits May 13, 2025 14:24

Reduces DocumentUsage tests to bare minimum

8776ddb

Renames DocumentUsage.ts to DocUsageTracking.ts

eb2e61d

Spoffy requested a review from berhalak May 20, 2025 13:08

berhalak reviewed May 21, 2025

View reviewed changes

Avoid accessing DB if nothing to do

572b042

Spoffy added 5 commits May 31, 2025 17:50

Avoids SaaS linter error

90889ae

Renames doc in test

545b753

Removes unnecessary login fix

3d4c312

Integrates env saving into existing funcs

aa16df7

Merge branch 'main' into spoffy/attachment-size-calc

a3d5583

berhalak self-requested a review June 2, 2025 08:31

berhalak requested changes Jun 2, 2025

View reviewed changes

Spoffy added 5 commits June 2, 2025 16:04

Merge branch 'main' into spoffy/attachment-size-calc

58c797f

Revert "Integrates env saving into existing funcs"

054792a

This reverts commit aa16df7.

Refactors external attachments helper into own file

e8acbee

Inlines function in DocUsageTracking tests

e6f70e8

Adds explicit error for sandbox problems during attachment size update

a55f7e6

Spoffy requested a review from berhalak June 2, 2025 20:37

berhalak requested changes Jun 3, 2025

View reviewed changes

Spoffy added 2 commits June 3, 2025 17:32

Cleans up DocUsageTracking tests

6a75682

Rethrows errors during file size update

e121a2b

Spoffy requested a review from berhalak June 3, 2025 18:00

berhalak approved these changes Jun 5, 2025

View reviewed changes

Spoffy merged commit 0d159d1 into main Jun 5, 2025
22 of 23 checks passed

Spoffy deleted the spoffy/attachment-size-calc branch June 5, 2025 20:39

	describe('DocumentUsage', function() {
	describe('DocUsageTracking', function() {

Uh oh!

Makes document attachment usage consider external attachments #1600

Makes document attachment usage consider external attachments #1600

Uh oh!

Conversation

Spoffy commented May 5, 2025

Context

Proposed solution

Missing attachments

Untrusted file size value

Related issues

Has this been tested?

Uh oh!

Spoffy commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paulfitz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Spoffy May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

berhalak Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

berhalak Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Spoffy commented May 5, 2025 •

edited

Loading

Spoffy May 31, 2025 •

edited

Loading

berhalak Jun 2, 2025 •

edited

Loading

berhalak Jun 2, 2025 •

edited

Loading

berhalak Jun 3, 2025 •

edited

Loading